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Abstract. In this article, the solution of a statistical inverse problem M — AU + £ 
by the Bayesian approach is studied where (7 is a function on the unit circle T, i.e., a 
periodic signal. The mapping A is a smoothing linear operator and £ a Gaussian noise. The 
connection to the solution of a finite-dimensional computational model Alkn = AkU„+£k 
is discussed. Furthermore, a novel hierarchical prior model for obtaining edge-preserving 
conditional mean estimates is introduced. The convergence of the method with respect to 
finer discretization is studied and the posterior distribution is shown to converge weakly. 
Finally, theoretical findings are illustrated by a numerical example with simulated data. 

1. Introduction 

Reconstruction methods with edge-preserving or -enhancing properties are widely stud- 
ied topic in deterministic inverse problems. There exists a variety of different sophisticated 
approaches in the literature including functional regularization (e.g., the total variation ap- 
proach fl3l ) or geometrical methods (e.g., the level set methods P31 '). In the Bayesian 
inversion theory some methods have been introduced aiming for an edge-preserving point 
estimate in the finite-dimensional setting fT9l [8l IT2l l42l . Especially the work by Calvetti 
and Somersalo [10, 9| with hierarchical priors is closely related to this paper. In general 
it seems to be difficult to establish how the posterior distribution behaves asymptotically, 
i.e., as discretization of the problem gets finer. This is due to the fact that such methods 
usually require non-Gaussian prior modeling and the related infinite-dimensional Bayesian 
theory is not fully developed. This paper introduces a novel hierarchical structure leading 
to non-Gaussian prior modeling for signal segmenting problems. We show that the limiting 
behavior of our model can be analyzed. 

Let us discuss the current perspectives in Bayesian modeling. Consider a linear inverse 
problem 

(1) M = AU + £ 

where U is the object of interest, £ a noise and M the measured data on some function 
spaces. In the Bayesian inversion these quantities are modeled as random variables and their 
probability distributions depict all information available prior to the measurement. With this 
information the goal is to make statistical inference on U given the model equation ^ and 
a realization M{uJo) of M. Sometimes the prior distribution of the object of interest U 
depends on an unknown parameter which then becomes part of the modeling and inference 
problem. Such prior structures are often referred to as hierarchical models. 

In practice the measurement is often produced by some finite-dimensional projection 
Mk = PkM. Furthermore, one also has to discretize U for computational purposes. This 
yields the computational model 

(2) Mkn = Pk {AUn +£)=AkUn+£k- 

Notice the two independent discretization levels n and k. Solving the inverse problem with 
the Bayesian approach requires two steps: first, one translates all a priori information into 
the probability distributions of Un and the noise £f^. The posterior probability Vkn{' I 
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i.e., the probability distribution of C/„ conditioned on the measurement m = Mfc(u;o), is 
then obtained by using the Bayes formula and equation Q. 

Usually the ultimate goal is to compute some information, e.g., point or spread estimates, 
from the posterior distribution. A point estimate that we discuss frequently in this paper is 
the conditional mean (CM) estimate which can be written for equation ^ in Euclidian 
spaces M" and as 



Now a natural question follows: what happens to the reconstructed information if f7„ or 8k 
is modeled on finer discretization, i.e., with a bigger n or kl Moreover, do the posterior 
probability distributions converge and how to guarantee that the reconstructed objects stay 
stable (e.g., CM estimate converges) as n and k increase? 

The interplay between solutions of problems ([T]) and ^ in general situations is not fully 
understood. However, some partial results exist. In fact, if Un and are obtained by projec- 
tions from Gaussian distributions the convergence of posterior distribution has been proved 
in very general setting by Lasanen in [32 1. To the author's knowledge only convergence 
studies with non-Gaussian posterior distribution have been done from this point of view 
recently in BTl and |[34l . These first positive results show some general conditions for ob- 
taining weakly converging posterior distributions and in addition converging CM estimates. 
We emphasize that these results require Gaussian noise distribution. 

Yet another non-trivial question is how to make sure that the crucial statistical properties 
of posterior distribution are not lost asymptotically? This is highly relevant to the edge- 
preserveness discussed above. Namely, in [33] it was shown that the usual modeling of 
TV prior carries an unpleasant defect such that the edge-preserving property is lost from 
the CM reconstructions as dimensionality of the problem increases. The reason behind this 
is that under different parameterization the prior distribution either converges to a Gauss- 
ian smoothness prior or diverges. In ['341 a non-Gaussian prior structure is proposed for 
edge-preserving CM estimates. The estimates n^^^ are shown to converge to so-called re- 
constructors that generalize the concept of CM estimates in infinite-dimensional spaces. 
We discuss this in more details later. The work by Piiroinen in [41] contains results about 
the existence of a discretization leading to converging posterior information in general non- 
Gaussian setting. 

Let us now review other related literature on the topic. First results on the Bayesian inver- 
sion in infinite-dimensional function spaces were introduced in by Franklin. This re- 
search has then been continued and generalized by Mandelbaum [37J, Lehtinen, Paivarinta 
and Somersalo |[35]| . Fitzpatrick lITSl . and Luschgy ifTSl . Lastly, we want to stress that the 
convergence of posterior distributions has also been studied from different perspectives. 
Namely, in li26l l27l l40l such convergence is studied when the objective information be- 
comes more accurate. Also, model reduction problems are considered in ll29i . For a general 
presentation on the Bayesian inverse problems theory and computation see ||28| and ifTTIl . 
The topic of probability theory in Banach spaces is covered in B9l1 . 

This paper studies the problem of edge-preserving reconstructions in signal restoration 
problems with the emphasis on how to locate discontinuities. For technical reasons we 
concentrate on periodic signals, i.e., the domain for our study is a 1-dimensional sphere T. 
We model our prior beliefs of the unknown signal u with a hierarchical structure ([/, V) 
where the auxiliary random variable V models how the discontinuities are distributed. The 
conditional distribution of U given a sample of V then models our prior information about 
u if we know where the discontinuities are located. Such Bayesian modeling has close 
connection to previous hierarchical segmentation methods [19, J0,j9j. The method draws 
also a lot of inspiration from the celebrated Mumford-Shah image segmentation method 
||39l and its variational approximation introduced by Ambrosio and Tortorelli [SO. 
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In this paper we introduce a finite-dimensional prior structure {Un,Vn) that produces 
a weakly converging posterior structure in the presence of a Gaussian noise. The main 
theoretical results concerning the prior can be divided into three parts: 

(i) There exists a well-defined random variable {U, V) : ^ —>■ L^(T) x L^(T) to which 
{Un,Vn) converges in distribution. 

(ii) The posterior distributions Vkn converge weakly in L^{T) x L^{T) assuming that 
the measurements converge. 

(iii) The CM estimate {u'in^v'^n^) converges to reconstructors of problem 

In addition we improve the results in fST] concerning the general theory. We implement 
our method in practice and include some numerical examples with computer generated 
data. The connection of maximum a posteriori (MAP) estimates to Ambrosio-Tortorelli 
minimizers that was presented in ||25]| is not studied here. 

This paper is organized as follows. In Section 2 we introduce relevant concepts and main 
results concerning the general theory. The infinite-dimensional hierarchical prior model 
{U, V) in L^(T) X L^(T) is defined in Section 3. We carefully show that such a construction 
is well-defined. Discretized prior distributions for (?7„, Vn) are constructed in Section 4. It 
is important to note that we can explicitly write down the related density functions. This 
becomes highly valuable in numerical implementation as no more approximations need to 
be made. Section 5 is divided into three parts. First the theorems of Section 2 are proved. 
Secondly, we show here that {Un, Vn) converges to ([/, V) in distribution on L^(T) x L^(T). 
We conclude Section 5 by showing the important property of uniformly finite exponential 
moments for the introduced prior structure. Finally in Section 6 we illustrate with numerical 
examples how our method works in practice. 



Next we define problem ([U rigorously. In order to do so let us introduce some notations. 
Below (•, •) refers to pairing of generalized functions with test functions. In real Banach 
space B the dual pairing is denoted by {■,-)b'xB- In a real Hilbert space H we denote the 
inner product by (•, . We denote the Borel sets in B by B{B). Throughout this paper 
whenever not explicitly mentioned we assume the measurable structure of Borel sets. The 
notation C{Bi, B2) stands for the space of bounded linear operators between Banach spaces 
Bi and B2, and C{B, B) is abbreviated as C{B). If the operator T : Bi ^ B2 is a bounded 
hnear operator, we denote the adjoint operator by T' : B'2 ^ B[. Recall also that a bounded 
hnear operator T in a Hilbert space H is said to be in the trace class if 



for some orthonormal basis {ej}^^ C H. We want to point out that the definition is 
independent of the choice of the basis. Throughout the paper if not explicitly mentioned C 
denotes a positive constant. For two functions /, (7 : X — > M U {00} we also write f ^ g 
if there exists a constant C > such that / < Cg as functions. Finally, for any s G M, let 
H^{T) be the L^-based Sobolev space lU equipped with Hilbert space inner product 



for any G H'{T). 

Let us return to considering problem ([U. Let S, P) be a complete probability space 
with a product structure Q, = Q,pr x Q^r, 5] = Ep^ ^er and P = Pp,. (gi Per. Through- 
out this section H will be fixed to denote a real separable Hilbert space. We assume the 
following conditions: 

(i) The mapping U : Qpr —^Hisa random variable. 

(ii) The mapping A: H ^ if^(T) is a bounded linear operator. 



2. General setting 



00 



Tih{T) := '^{Tej,ej)H < 00 
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(iii) The random variable £ : —>■ H^^{T) is Gaussian with expectation ¥,£ = and 
a covariance operator Cs : H~^{T) —^ H^^{T). 

(iv) The range of is dense in H^^{T). 

The conditions (iii) and (iv) imply that Cs is one-to-one, self-adjoint and in the trace class 
and that we have a unique positive and self-adjoint power C| for any t G M. Later in 
numerical examples £ has a covariance operator Cg = {I — A)~^ : H^^{T) —>■ H^{T). 
Such a random variable is white noise in the sense of generalized random variables [32|. 

Definition 1. Let fi be a centered Gaussian measure on {H,B{H)) and its covariance 
operator C : H ^ H such that Ran(C) is dense in H. We call the real separable Hilbert 
space 

Hi/,) = {fGH 
equipped with inner product 

for any f,gG H{p) the Cameron-Martin space (or the reproducing kernel Hilbert spacej 

This definition can be seen to coincide with the usual definition of Cameron-Martin 
spaces by Proposition 2.9 in llT3l . The Cameron-Martin space structure is used later in 
Section 4. For an extensive presentation on the topic in locally convex spaces see [7 |. 

If [/ G L^(0, S; H) and Sq is a sub cj-algebra of S, we denote the conditional expecta- 
tion of U with respect to a-algebra Sq by E(C/|So). That is, E(C/|So) G L^{9., Sq; H) and 
it satisfies 

(4) / E(f/|So)(tj)P(dw) = / U{uj)¥{du)) for all Dg So. 

All vector-valued integrals in this work are standard Bochner integrals. For more infor- 
mation on Bochner integrals see |[T4l . The operator '■ U ^ E(C/|So) is a projection 
Pso '■ L^{^, S; H) —>■ L^{Q,, Sq; H), where L^{Q,, Sq; H) denotes the space of measurable 
functions from {Q,, Sq) to (H, B{H)) which are Bochner integrable. 

Definition 2. Denote by ^A d the a-algebra generated by the random variable M. We 
say that any deterministic function 

(5) nM{U\-) : H-\T) -^H, m ^ nM{U\m), 
is a reconstructor ofU G L^{Q,, S; H) with measurement M if 

(6) nMiU\M{Lo)) =E{U\M)iuj) almost surely. 

If H is a real separable Hilbert space, g : {H,B{H)) {H,B{H)) is a measurable 
function and g{U) G L^{Q.,Ti] H), we define TIm{9(U)\-) : H~^{T) — > H to be any 
deterministic function satisfying 

(7) nM{giU)\M{uj)) =E{g{U)\M){u;) almost surely . 

We refer to fMl for the existence of TZm. Note that although Tl]\j is not necessarily 
unique it was shown in 1 34] that in the presence of Gaussian noise the following choice can 
be made: Assume that the prior distribution XofU has finite exponential moments, i.e., 

exp{c\\u\\jj)dX{u) < oo 

for any c G M, and assume H is a. real separable Hilbert space. Furthermore, let g : 
{H,B{H)) {H,B{H)) be a measurable function satisfying E||gf(C/)||^ < oo. Then a 



C 



-1/2 
X J 



< OO 
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function 7^A/(^7|•) : H-^(J) H defined by formula 

(8) nM[g{U) I m) ^ 



H(n, m)d\{u) 

is a reconstructor, where E : H x H^^{T) ^ M is the function 

1 2 -1 

E{u,m) = exp(-- pu||i2 + An,m)j|^-i). 

Throughout tfiis paper we make the above choice of reconstructors. 

As was discussed earlier the measurement is never infinite-dimensional in practice. Let 
us next explain how we assume the measurement to be obtained. 

Definitions. The finite-dimensional linear projections ■ H^^(T) H^^(T), k £ N, 
are called proper measurement projections when they satisfy the following conditions: 

(i) We have Ran(Pfc) C -fr^(T) and \\Pk\\c{H'^) — C^ofor some constant Cq with all 

(ii) Fort G { — 1,1} we have 

lim \\Pkf - /ll^, = 

for all f G H\T). 

(iii) For all ipji/j £ L^(T) it holds that 

The conditions in Definition [3] are same as in ll34l Thm. 3] and are motivated there. We 
note that in this paper these assumptions are only used in the proof of Theorem [T] 
In practical situation the measurement is a realization of a random variable 

(9) Mk = PkM = AkU + £k, 

where = PkA, £k = Pk£- In order to be able to compute a numerical solution one has to 
discretize also the random variable U (independently of P^) in H. Denote the discretization 
by C/„ : Q — > Hn C H in a finite-dimensional subspace Hn- Now the two discretizations 
with respect to n and k lead to the computational model We note that the reconstructor 
can be defined for all above models, for problem ([T]) on H^^{T) and for problems ^ and 
^ on Ran(Pfc)- Before next definition recall that probability measures /i„, n G N, converge 
weakly to // in {H, B{H)) if for every bounded and continuous function f : H ^ holds 
that 



lim / f{u)dfiniu) = / f{u)dfi{u). 

In the following definition we characterize a condition that allows converging probability 
measures to have only very small tails. 

Definition 4. We call measures fj, and n G N, o« {H, B{H)) uniformly discretized with 
exponential weights if 

(i) converges weakly to fion H and 

(ii) for every b > there exists a constant < C{b) < oo such that 

/ exp{b\\u\\fj)dfin{u)<C{b) and / ex.p{b\\u\\fj)dn{u)<C{b) 
Jh Jh 

for every n G N. 

We are now ready to formulate our main theorem regarding the general theory. We 
postpone the proof to Section 5.1. 

Theorem 1. Assume the following three conditions: 

(i) The operators P^ ■ H^^{T) — > H^^{T), /c G N, are proper measurement projec- 
tions. 
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(ii) The probability distributions ofUn,U : 17 ^ ff, n € N, are uniformly discretized 
with exponential weights. 

(iii) A continuous function g : H ^ H where H is a real separable Hilbert space, 
satisfies 

\\g{u)\\^ < Cexp(C ||u||^) 
for all u £ H with some constant C. 
Now let u = U{ijJq) and e = £{ojo) be realizations of the random variables U and £, 
respectively, and let 

m = Au + e and m^. = A}^u + P^e 

be the realizations of the random variables M and Mj. in equations © and respectively. 
Then the reconstructors defined by formula ©/or models ^ and ^ satisfy 



lim TZMk„{9{Un) I ruk) = TZM{g{U) \ m) 

fc,n— +00 



in H. 



Let E <Z H he & Boiel set and l^; be the indicator function of E. Define probability 
measures 

V{E I m) = nM{lE{U) I m), 

Vkn{E I nik) = 7^A4„ (!£;([/) | m^) 

on H with the same choices of reconstructor made in Theorem [U One notices that these 
measures correspond to the posterior distribution obtained from Bayes formula in the finite- 
dimensional case. An important corollary to Theorem [T]is shown in |[34l . 

Corollary 1. Let the assumptions in Theorem \I} hold. Then the measures Vkni' I i^k) 
converge weakly to the measure V{- \ m) on H. 

We conclude this section by discussing shortly how to solve reconstructors in practice. 
For the moment assume that all the conditions in Theorem[T]hold and dimRan(Pfc) = K £ 
N. Moreover, assume Un ■■ ^ ^ Hn C H where dimF„ = N £ N. Let 1^ : ^ 
and /Cfe : Ran(Pfc) be isometrics and let us use them to map the computational 

model Q into a matrix equation. In the following we use bolded notation for vectors and 
matrices in Euclidian spaces. Denote U„ = InUn = (u^, uj^)^ : $7 — > M^. This 
yields 

(10) Mkn = JCkMkn = AknVn + Efc 

where Afc„ G M^^^ and Mfc„,Efc : —>■ E^. The posterior density function 7rfc„ can 
now be easily obtained for problem ([TOl l via the Bayes formula. In Section |6] assumptions 
on the noise £ and the measurement projections imply that E^, is white noise. In such a case 
TTkn has the form 

nn(u„)exp(-i ||mfc - Aa;„u„||2) 
7rfc„(u„ I mfc) = — ^- , 

where n„ is the prior density and T^n is the density function of Mfc„. For a related dis- 
cussion on the discretization of white noise see the Appendix B in |[34l . The CM estimate 
corresponds to a reconstructor with g = id : H ^ H and it can be obtained by computing 
integral 

(11) := U7rfc„(u I mfc)(iu 

since with the choice of reconstructors in equation ^ it holds that 

(12) nM,„{Un\mk)=I-'{u^^') 
for any k,n £ 'N. 
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3. The continuous prior model 

In this section we introduce a hierarchical probability distribution in (T) x (T) and 
prove that it is well-defined. Denote first by Dq a perturbed derivation 

(13) Dq = D + e'iP : H^{T) ^ L^{T) 

with some q > 1 and a projection operator Pf{x) = {jjf{t)dt)l{x) for / G L^(T) 
and = 1 for every x G T. The reason for this perturbation is that the operator 

Dq : i/i(T) ^ L2(T) is invertible. Also denote L = Dq^ : L^{T) L^{T) and a 
multiplication operator A : ^^(T) C{L'^{T)) by 

A{v)f = ie^ + v^)-'f 

for any f , / E L^(T). Define operators 

(14) Cv = (^^I - eA^ and Cu{v) = LA{v)L* 

on L^(T) with each v G L^(T) where L* is the Hilbert-adjoint of L. It is straightforward to 
show that both operators iCu{v) with fixed v) are positive self-adjoint trace class operators. 
This allows us to define the following Gaussian measures on L^(T) which we use in the 
construction of the prior probability distribution. 

Definition 5. Let v be the Gaussian measure on (T) centered at value 1 (x) = 1 with 
covariance operator Cy and with given v E L^(T) let A" be the Gaussian measure on 
L^(T) centered at with covariance operator Cu{v). 

Remark 1. Now a possible way to proceed is to define a probability measure A on (L^(T) x 
L2(T), S(L2(T) X L^(T))) in such a way that with any measurable sets E,F C i3(L^(T)) 
we have 

(15) X{E X F) = J \''{E)dv{v) 

and assign X as a distribution to a random variable {U, V) : Q ^ L^(T) x L^(T). In 
fact, finding a unique extension to Xfor all Borel sets connects this problem to more general 
considerations of the existence of Markov chains with given transition operators Il20[|17[ l6l. 
The unique extension can be shown to exist using results related to stochastic kernels BOI . 
Also, in the framework of M -spaces and Markov operators the extension result here can be 
proved using Lemma L3 in BTI . 

However, in the rest of the paper the marginal distributions of A play a central role. We 
achieve more flexible framework especially for the analysis of the discretized distributions 
by constructing a suitable probability space and deflning random variables U and V sep- 
arately. Consequently, we exclude the extension proof at this stage since later the joint 
distribution of{U, V) is shown to satisfy equation (1151 ) as a byproduct of the construction. 

Remark 2. Throughout the rest of the paper we keep e > and q > 1 flxed. The role of e 
is to control how sharp edges we will have in the reconstructions. 

To simplify our notations we assume that the probability space has the additional struc- 
ture VLpr = X 0.2, Spr = Si ® S2 and Pp^ = Pi <X) ^2- 

Definition 6. Let V -.0.2 ^ L^(T) be a random variable with distribution v. 

We note that V has a very similar distribution with the so-called Gaussian smoothness 
prior. The smoothness prior is well-known to have realizations in //''(T) almost surely for 
any s < 1/2 and this can similarly be shown to V . In fact here the one-dimensional domain 
allows us to go further with the smoothness. Below the notation C'''" refers to Holder 
spaces with exponent a > and denotes the L^-based Sobolev space with exponent 
t e R (see ID). 
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Lemma 1. The random variable V -.0^2 ^ L^(T) satisfies following two statements: 
(i) For any t < 1/2 and 1 < p < co we have V € VF*'P(T) almost surely, 



and there exists a version V ofV such that V -.0,2 ^ iy*'P(T) is measurable. 
(ii) For any < a < 1/2 we have V G C*''"(T) almost surely and 

E \\V - l\\co.c < oo. 

Proof. Consider the centered variable V = V — 1. By the Schwartz kernel theorem there 
exists a unique distribution Ky/ G V'{T x T) such that {Cv'(p, V') = {Ky', 4> ® It is 
straightforward to verify that Ky is the Green function of ^/ — eA. Such a function is 
known to be Lipschitz continuous, i.e., Ky € C'^'^(T x T) and even in C°° outside the 
diagonal. Let t E [0, ^) and define a new kernel K on as 

(16) K{x,y) = (1 - A,.)*/'(l - l\yYl^Ky,{x,y). 

Now by Hi Prop. 13.8.3] and Hi Sect. 13, (8.7)], we have K{x,y) € C^'^-'^\T x T) 
and since t < ^, we have in particular that K is continuous and bounded. By IT] Prop. 
3.11.15] we have that for any 1 < p < oo there exists a random variable Vp in L'p{T) with 
covariance operator Cp : L'f' (T) LP{T), ^ + ^ = 1, such that 



Cpf{x)= / K{x,y)f{y)dy. 
Jj 

Furthermore, Vp satisfies 

E||Fp||Pp < oo. 

Due to fie', Prop. 13.8.3] and El Sect. 13, (8.7)] we can define for any 1 < p < oo a 
Gaussian centered random variable Vp = {I — A)~*/^l^ in W^'^iT) with the property 

^\\K\\w..<^- 

One notices that the covariance operator of Vp coincides with Cy . The claim (i) follows 
from the two distributions being the same. Furthermore, the Sobolev embedding theorem 
states that the space VF*'*'(T) can be embedded compactly into C'^'*^^/^(T) HI. This proves 
the claim (ii). □ 

Definition 7. From this moment on in all our analysis we replace V with such a version V' 
thatV'{L02) £ iy*0'P»(T)/ora//w G n with some fixed to and po and V : ^2 ^ W^o,po(^j^ 
is measurable. We keep denoting this new random variable by V. 

Let W : ^ H^{T), s < —1/2, be a Gaussian random variable satisfying KW = 
and 

(17) E{{W,<P)Hs{W,i;)H^) = {Cs^,tl;)Hs 

for any (pjip £ H'^ where Cg = {I — The random variable W is white noise in iJ*(T) 
in the sense discussed in Section 2. 

In the following the idea is to define U{uji,uj2) by operating to W{uJi) with a square root 
of the mapping Cu{V {u)2)). Since Cu{V{ijJ2)) was defined above on L^(T) we have to be 
careful how to define the square root. 

Let us begin by defining an unbounded bilinear form 6„ : (T) x (T) — > M, 



(18) 



Jt 



for (f>,'ilj e if^(T) and v E C°'"(T) with a > 0. Due to EB Thm. VI. 1.21, Thm.VI.2.1] 
there exists a unique linear self-adjoint operator By : V{By) —>■ -L^(T), 'D{By) = 
L2(T) I (e^ + v'^)Dq(l) £ H^(T)}, such that 

(19) K[cj),ij] = {B,<|),^lJ) 
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for all 0, V' £ T^{Bv) and V{By) is dense in L^(T). Moreover we can deduce 

(20) B, = Dl{e^ + v^)Dq, 

which is an invertible operator from V{Bv) to L^(T). The operator D* denotes the L^- 
adjoint of Dq. Clearly, B^ is the inverse of Cjj{v) defined in equation ([T4l ) for any v G 
C°'"(T). 

The operator B^ was constructed in such a way that its spectrum in L^(T) is strictly 
positive, i.e., a{B^) C [c, oo) with c = c(e) > 0. Next let us study the mapping properties 
of B^ in H^iT). We notice that B^ : H^{T) — > H~^{T) is an invertible mapping and the 
pairing {B^u,u) can be estimated with the i7^-norm of u from below. For later 
purposes choose 5 = 5{e) > such that it satisfies 

(21) {ByU,u)H~ixm > S\\u\\m 

for u G H^{T). It is important to note that both c and 6 are independent of v. As the 
spectrum of By is positive we can define a square root of Cu{v) as a Dunford-Taylor integral 

(22) r, = ^ / z'^/^By - z)-^ dz : H-\T) ^ ^-^(T) 



where 7 is the curve 



7 = {z G C : dist(z,M_) = f} 



2- 

oriented in such a way it turns around the origin in the positive direction. Furthermore, 
z ^ maps C \ 1„ ^ C so that E+ maps to itself. By Thm. V.3.35] the 

restriction of F^ to L^(T) is an unbounded self-adjoint operator and by OTi Lemma V.3.36] 
satisfies 

(23) {Ty\L2f = B-^\l2=Cu{v) 
in L2(T). Next we prove a uniform bound for the norm of F^, . 

Lemma 2. There exists a constant C = C{s,6) such that for any a > and for all 

V G C°'"(T) we have 

(24) l|rt)||£(Hs,L2) < C 
with s > —1. 

Proof. Let a > and v G C°'"(T). We prove the claim by interpolation arguments. First 
note that 

0« il<^--'>"llaL'.S dis.(..'(BW) 

for any z G 7. Recall now that B^ — z with z G 7 is an invertible operator between spaces 
H^{T) and H-^{T). We assume that / G H-^{T) and u G i?^(T) satisfy equation 

(26) (S, - z)u = / 

in H^^{T) for some z G 7. Taking duality pairing of / with u in equation (l26l) yields then 

(27) {ByU,u)H-ixm = z WuWli + {f,u)H-lxH^- 
For z G 7 we have Re(z) < 6/2 and thus 

S 2 

(28) {B,aU,u)H-ixm < ^W^Wl^ + ^^(f^'^)H-^xm- 



Combining inequalities (1281 ) and (|2TI) we get 

< ^ ||u||^i + ||u||^i II J ||j;^-i 

This yields the bound 

(29) ||(,_s,)-i||^^^_^_^^^<2 
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when z G 7. The equation (|27] ) imphes 

(30) Rc{-z + d)\\u\\l2 = -{{B,,u,u) - 6\\u\\l2) +Rc{f,u)H-i^m 

where we have added the term 5 ||n||^2 and taken the real part. Again due to inequality ((2TI) 
the right hand side is less than Re.{f,u)H-iy.fji. Furthermore by applying the Cauchy- 
Schwarz inequality and inequality (1291 ) we have 

(31) II^^Mr^lll/ll"- 

which proves the estimate 

(32) l|(^-^-)"i£(i^-i,L2) ^ kl"'/' 

with 2; G 7. Now we are ready to interpolate (see, e.g., ll46l Prop. 13.6.2], Q and B71 ) 
equations (l25l) and (l32l) and get 

(33) ||(. - :< (N-^O" ( dis.(., V.)) )"' ^ I'l-"' 
for — 1 < s < 0. For s > — 1 and z G 7 we see that 

is an integrable function on 7. Finally this yields 

ll^f ll£(i^^L2) ^ C", 

for any s > — 1 with some C = C{s,6) > that is independent of v. □ 
Definition 8. Define the mapping U : Q ^ (T) as 

(34) Uiu;i,L02)=Ty^^^)W{iOi) 

where W is the centered Gaussian random variable defined by equation (1171 ) in (T) with 
some — 1 < s < — 1/2. 

Let us show that this mapping is measurable and hence a random variable. Recall that a 
function X : Q ^ is said to be strongly measurable if there exists a sequence {Xj}JLi 
of simple functions converging pointwise to X. In separable spaces such as i?*(T), s > 0, 
the measurability is equivalent to the strong measurability. In addition, an operator valued 
function X : Q ^ C{Hi , i?2) is said to be strongly measurable if the vector valued function 
(J I— > is strongly measurable in H2 in the sense presented above for all f € Hi. 

Proposition 1. The mapping uj2 ^ f'v(uj2) ^ C{H^{T), L^(T)) is strongly measurable for 
all -1 < s < -i. 

Proof Recall from Definition|7]that y is a -valued random variable. As such a space 

is separable we have a sequence of simple random variables Vj converging pointwise to V. 
Due to the Sobolev embedding theorem there exists < a < 1/2 and C > such that 

(35) \\Vjiu;2) - V{u;2)\\co.. < C \\Vj{u;2) - F(c^2)|| 

for all L02 G ^2- Hence Vj converges pointwise also in C'''"(T). Next fix L02 G 0,2 and set 
Vj = Vj{L02) for all j G N and v = V{lo2)- Let us factorize the operator 

{B, - z)-^ - {B,^ - z)-^ = (B, - z)-\B,^ - B,){B,^ - z)-^ : H'{T) ^ L\T) 

where the right hand side operators are considered as a sequence of mappings 
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An operator and its adjoint have the same norms and, since {z \ z £ 'y} = {z \ z £ 7}, 
inequaUty (|33] ) yields 



(36) \\{B^ - z) ||£(^2^^i) = \\{Bv - z 
Interpolating inequalities (l36l ) and ( |29l ) gives us 

(37) Z) ||£(iys^j:^l-) 



1-1 I 



l£(/f-i,L2) 



■< \z 



-1/2 



for —1 < s < —1/2. In the same way as above we see how the operator — B^, maps 



Da 



l2(t) _A/7-i(T). 



In this framework the operators Dq and D'^ are both bounded. The multiplication operator 
is also bounded and converges to zero in the norm topology due to (|35] ) as j increases. 
Altogether this yields 



(38) 



lim LB„, - BJ 



0. 



Now returning to random variables Vj and V and adding up inequality (|37] ) with 
get 



we 



{B 



(B 



£(/^^L2) 
< C 



B 



V(UJ2) 



B 



for all L02 G ^2 and furthermore 

< c 



L2 



\Z\ 2 ' 2 



B 



V(U2) 



< C 



H' 



dz 



B 



V{ijJ2) 



B 



Vj{i^2) 



for all / G H^iT) and uj2 G ^^2- Due to equation (1381 ) this proves the claim. □ 

Corollary 2. The mapping U : Q ^ L'^(T) in Definition\8\is strongly measurable. 

Proof. According to the Proposition [U we can take simple random variables Fy, that con- 
verge pointwise to Ty in C{H^ {T) , iT)) and simple random variables Wj that converge 
pointwise to W in H^{T) with s < —1/2. Now for any lo = {lui,uj2) G ^ we have that 



rvi^,)W{uJi)-Tv^^^^)W,{uJi) 



L2 



< r 



Vi-^2)\\c(H',L^) 11^(^1) - Wj{uJi)\\j^, 



+ 



ViuJ2) 



Vj{0J2) 



\Wj{u;i)\\ 



H" 



□ 



converges to zero for— l<s<— 1/2. 

Let us return to the discussion in Remark [T] Also let— 1 < s < —1/2 and fix 002 G 
and V = V{uj2)- For any (/>, "0 G Lp'iT) we have 

E(C/(.,a;2),(/')L2(?7(-,u;2),V')L2 = E(H^(-),F»^.,^-.(W^(.),r;V')j/»^//- 

= E(I^(-), C.sK<t>)Hs {W{-), C7_,F»H^ 

= (C,C_,F>,C_,F»H^ 

= (r?</',V'>L2 

where Ct = {I — A)* for t G M. By the Fubini theorem we can deduce that the probability 
distribution of ([/, V) on ^^(T) x ^^(T) is some extension of A defined in equation ([TS] ). 
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4. The finite-dimensional prior model 

We have two objectives in the construction of a finite-dimensional prior model for the 
discretized problem Q. Obviously it is necessary to have weakly converging probability 
measures. After defining f7„ and ¥„, this property is proved later in Section 5. The second 
objective is to be able to compute the probability densities explicitly. For anyone applying 
such a method in practice it is valuable that no additional approximations are needed. The 
main difficulty in obtaining the explicit form is clearly the nonhnear dependence of Cu{V) 
with V. 

The following definitions can be intuitively considered as truncated random series or 
projections of the original random variables U and V. There is a well-known result 171 
Prop. 3.5.1] about Gaussian series which states that Cameron-Martin space provides a 
natural framework for the basis of the series. Also, as we will see, this approach makes 
it easier to control the nonlinearity discussed above. 

Notice that the Cameron-Martin spaces H{v) and i?(A^('^2)) for all fixed uj2 G ^2 have 
equivalent norms with the standard norm of H^{T). More precisely, the norms satisfy 

(39) IMI/fH = ^ IMIl2 + e|l^-|lL2 and = {{e^ + v'^)Dq-, Dq-)L2 . 

This can be shown by density arguments after the equalities are first established for func- 
tions in C°°(T). 

Inspired by this connection we show that the continuous and piecewise linear functions 
provide a suitable framework for the discretizations. For any n G N define 

(40) PL{n) = {f £ C(T) | / is linear on each K^,j = 1, A^} C H^iT) 

with Kj^ = [{j — 1)/N,j/N), j = 1, N. The value of N depends on n and for the rest 
of the paper we fix notation 

N = N{n) = 2". 

In addition, whenever needed we consider T as the closed interval [0, 1] with the point 1 
identified as 0. Notice that with the notation above PL{n) C PL(n + 1) for all n G N. 
Define also piecewise constant functions on the same mesh 

(41) PC{n) = {/ G L2(T) I / is constant on each = l,...,N} C L'^{T). 

In the following we use frequently the fact that -D<j|pl(„) : PL{n) — > PC{n) is an invertible 
mapping. 

4.1. The definition of y„. Let us consider for a while H^iT) equipped with the inner 
product (•, •)h{v)- Form an orthonormal basis {gj}^i with respect to this inner product 
so that for each n G N the set {gj}^^i spans PL{n). Define an orthogonal projection 

Rn : H^{T) PL(n) C H^{T) as 

N 

Rng = '^{g,gj)H{u)9j 

with g G H^{T). A short computation yields that the corresponding adjoint operator in 

H-\T) is 

N 

R'nd' = ^{a' ,9j)H-^xmCy^9j 

for any g' G i^^^T). 

Definition 9. Define ¥„. : ^2 PL{n) C L^{T) as 

N 

(42) K(u;2) = J^Vf(c^2)<?j +1, 
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where Vj^ : ^ 1^ fl;re independent random variables with standard normal distribution, 
l{x) = 1 and Qj G PL{n) are as chosen above. Denote the probability distribution ofVn 
on L2(T) by u„. 

Let us shortly consider the covariance operator of Vn in L'^{T). Clearly, for any cf) G 
L2(T) it holds that 

Furthermore, we have that 

{CvM)l^ = E(F„-l,0)i2(F„-l,V')L2 

N 

j,k=i 

N 

= {RnCv(t),1p)L2 

for any <p,il^ & L^(T). Hence we can conclude that 

Cv„ = RnCvB!n\L2 : L\T) ^ ^^(T) 

for all n G N. 

4.2. The definition of C/„. The discretization method applied to V cannot be used with U 
since we do not want the corresponding basis to depend on reahzations of V. To avoid this 
consider now (T) equipped with the inner product 

= {Dqf,Dgg)L2 

for f,g £ H^{T). In the same manner as above form an orthonormal basis {fj}'jLi C 
H^{T) with respect to (•, •) so that for each n G N the set {/jj^Li spans PL{n). Define 
then an orthogonal projection Sn ■ H^iT) PL{n) C H^{T) as 

AT 

(43) Snf = Y.{f,fj)Dji 

for any / G i?^(T). The dual operator S'^ : can then be written 

N 

for any/' G /f-i(T). 

The functions {Dqfj}^\ C L^(T) form by definition an orthonormal basis to L^(T) 
with respect to the usual inner product of L^(T). Denote by r„ the orthogonal projection 

N 

Tn(t) = ^<lfj)L2Dqfj 

from L2(T) to PC{n) C L2(T). One notices that 

TV 

DqSnD-'<P = Dq ^{D-'cj,, fj)Djj = T^<t> 

i=i 

for any (j) G L^(T). The projection T„ is self-adjoint on L^(T) and hence we also have 
equality Tn<j) = {D'^~^ S'^D'q<j) for any (f) G L^(T). Let us next show an auxiUary lemma 
about the convergence of the projections Sn- 
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Lemma 3. For the orthogonal projection Sn defined in equation (1431) it holds that 

Y\m^\\I - Sn\\c(^H^^H^) = 

for any t < ^. 

Proof. Let t < 1/2 and notice that {DqD'^Y^^ is a trace class operator in L^(T). Since 
trace is invariant witli respect to the basis and norms \\-\\fjt and ||D*-||^2 are equivalent, we 
have that 

since functions {Dqfj}'jLi are an orthonormal basis in L^(T). Let 6 > and choose so 
that 

j>N 

Obviously the functions {Dg/j}^^ also form an orthonormal basis for L^(T) and we can 
write for each / G H^{T) 

oo 

\\f-Snf\\% < CY,{Dlif-Snf),D'jj)l, 

i=i 

oo 

i=i 

oo 

< CY,\\DJ\\lUl-Tn)iD',Yfj\\l, 
since {D'^Yfj S L^(T). Hence we can estimate the sum as follows 



\\f-Snf\\% < Clf^ 



oo 



k>n 



\DJ\\ 



2 



\j=l k>n / 
\k>n / 



< C5\\f\\l, 



when n > N. 



□ 



Before defining C/„ let us still introduce one more notation. Let A„ be the multiplication 
operator 

An{v)f = + (Qnvfr'f 

for any v G L2(T) and / G L'^{T) where 

TV 

QnV = Ny I v{x)dx-\j^N 

] = l 3 

and Ip-N is the indicator function of the set Kf = [{j — 1)/N, j /N). 
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Definition 10. Let Un '■ ^ ^ L"^ (T) be the random variable 

N 

(44) C/„(a;i,a;2) = ^ Uf (a;i, u;2)/i 

where the random vector U^(cj) = (Uj^(a;))^i G /i' given the following structure: 
Denote by uj2 ^ ^{^2) G M^^^ a random matrix such that 

Cjk{i02) = {kn{Vn{uJ2))Dqfj, Dqfk) ■ 

Due to the positive definiteness of C we can define 

V^iuj) = C(tJ2)^WAr(tJi) 

where W^v : ^^i — > is centered Gaussian random variable with identity covariance 
matrix. 

The measurability of : 17 is a consequence of the mapping uj2 ^ Ki('i^2) 

being measurable. Also it follows from Definition [lO] that with fixed uj2 the probability 
distribution of uji ^ C/„(u;i, ^2) is centered Gaussian with covariance operator 

Cc7„(K(a;2)) = SnD-'Kn{Vn{u:2)){D'^)-\Sn)'\L^iT). 
This can be seen from the short computation 

TV N 

{Cu„4>,i^)L2 = ^^Cjk{fj,(t))L2{fk,'ip)L2 

j=l k=l 

= /a„(K(^2)) iY,{fj,(P)L^DjA ,Y,{fk,ij)L^DJk) 

\ \j = l J k=l 

= {An{Vn{u;2)){DXHSny^, {D'^)-\Sn)' i^) 

for all G L2(T). Denote the distribution of t/„(-,u;2) on ^^(T) by Ajf"^"""^ and the 
joint distribution of ([/„, F„) on ^^(T) x L'^iJ) by A„. 

4.3. Prior density. Let us show in this subsection how the prior density function of the 
random variable {Un,Vn) can be written down explicitly. Consider mappings Tn,Jn '■ 
PL{n) such that 

l^^xj/jj =x and Jni^^^jgj 

for any x = (xi, ...,xjv)^ G M^. Use the following notation for the density functions: 
let n(-u„,v„)' and nu^|v„(' I Jnv) denote the densities of the probability measures 
A„ o (Z^7^, ^^^) on and f„ o J,^^ and AJ'j o on M^, respectively, with any v G 
PL{n). Below (p oc ip denotes relation (j) = cijj with some constant c. 

Theorem 2. Lef u G PL{n) be arbitrary and v = J^v G M^. T/jen 

/ 1 / 2 1 2 

(45) nviv(v) oc exp ( -- ( e llD^jjl^a + 4^ ll"^ " 1|Il2 

with = 1 for all x G T. 

Proof. We recall that by definition \^ are independent standard Gaussian random variables 
for all 1 < j < iV. It is easy to see that 

||v- Jnllljjjv = \\Jn{v-l)\\iiN = \\v-l\\H{y) 

since Jn is an isometry between PL{n) C H{u) and M^. By equation (|39l ) we now obtain 
the claim. □ 
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Theorem 3. Let u,v £ PL{n) be arbitrary and u = InU, v = J'nV G M^. Then it holds 
that 

nu„|v„(u I v) oc exp (^-i (^^-iVlog(e2 + (Q^v)^) + (e^ + {Q^vf)\Dgu\^dx^^ . 

Proof. The density function of a Gaussian random variable in can be written as 

nu„|v„(u I v) oc exp ^-^(logdetC + (u, C"^u)B;iv)^ 
where the matrix C depends on v and its elements satisfy 

Cjk = {An{v)Dgfj,Dqfk)L^ 

for I < j, k < N. Our challenge is to compute explicitly det C and the inverse matrix C~^. 
Notice first how A„(v) maps PC{n) to itself. Inspired by this let us consider C as a matrix 
representation of the linear operator A„(f) : PC{n) — > PC{n) in the basis {Dqfk}^^^. 
Next consider another L^-orthonormal basis for PC{n), namely, {V^lx^^lfci- Let the 
matrix S G M^^^ be the matrix presentation of the change of the basis {Dqfj}jLi to 
{Vn Ij^n}^.. The components of this matrix are given by the formula 

(46) Sjk = {Dgh,VNlKf)L^ 

for I < j,k < N. Moreover, S is invertible and satisfies S^^ = S^. 

Now the key notion is that since An{v) is diagonal in the basis {VN'i-K^}f=i' 
factorize matrix C as 

C = S^^LS 

where the diagonal matrix L is the representation of the multiplication operator A„(w) in the 
basis {\/iVl^]v}:J^i . One can show that the diagonal of the matrix L consists of elements 

3 ■' 

((e^ + iVli^Jv)2,2 for 1 < i < iV. This immediately yields that 

3 

N 

(47) detC = detL = J[{{e^ + (Q„i;)2)-i, iVl^Ar)^^ . 
Now we have 

N 

-logdetC = V-log((e2 + (Q„t;)2)-i,iVl^^)^2 = / N\og{^ + {Qnvf)dx, 
which yields the first part of the density function. Furthermore, a simple computation yields 

N 

(u, C-iu)i,.v = (Su, L-iSu)i,.v = Y.{e^ + {Qnv)\Nlj,^) (Su)^. 

i=i 

Assume that u = Ylik=i ^kfk and u = (ui, u^)'^ G R^. Then by the equation (l46l ) it 
holds that 

N 

^ ' 3 3 

k=l 

and finally 

TV ^ 

i=i 

= [ie^ + {Qnvf)\Dqufdx, 

which proves the statement. □ 
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We conclude this section by pointing out that 

(48) n(u„,v„)(u,v) = nvjv) -nu^ivju I v) 

for any u, v G M^. In consequence, the joint density is obtained from lemmata [2] and [3] 
5. Convergence of the CM estimates 

Two previous sections were devoted for the construction of the prior distributions. This is 
however only halfway in our search for a scalable reconstruction method. In order to show 
the convergence of conditional mean estimates one also has to consider the interplay be- 
tween likelihoods, prior distributions and the measurement equation. We turn our attention 
to this in the following. 

5.1. General conditions. Some general conditions under which reconstructors converge 
were given in [34|. We generalize these conditions in Theorem [T] The essential difference 
is that the finite-dimensional priors are not given by linear projections. Note that here we 
consider now a general prior random variable U : Q ^ H with a real separable Hilbert 
space H. Let us first prove a version of the Vitali convergence theorem for probability 
measures satisfying Definition |4l 

Lemma 4. Assume that fin cind fi are uniformly discretized probabUity measures on H. 
Suppose that f : H ^ M is continuous and < f{u) < exp(6 ||^t||/^)/o?' some constant 
b > 0. Then we have the convergence 

(49) lim / f{u)dfxn{u)= [ f{u)df,{u). 

Proof. Let us first denote Bj = {u £ H \ f{u) > j} and fj{u) := min(/(u), j) for any 
u £ H. We get an upper bound for the probability of Bj by 



(50) fi{Bj) < - [ f{u)dfi{u) < - [ exp 



{b\\u\\)dfi{u) < 



C{h) 



J 

where C{h) is given in Definition |4l Notice how the exactly same bound applies also for 
fjLn{Bj). From equation (l50l) we can deduce 



/ \f-Mdfi = [ \f-Mdfi 

Jh JBj 

< 2 / exp{b\\u\\fj)dfj,{i 



< 

where C{b) = 2^J C{2b)y^C{b). Again the same procedure applies for /i„ yielding the 
same upper bound. Notice carefully that the bound does not depend on n. Now the result 
follows by approximating 

/ fdfJ,n- / fdfi < / \f-fj\dfJ,+ / \f-fj\dfin+ / fjidfl-dfln) 

Jh Jh Jh Jh Jh 

and using the weak convergence. Namely, for each 5 > we can choose j so that we have 
C{b)/^/J < 5/^. On the other hand for each j there exists n' so that | J^^ fj{dfj. — dfj,n)\ < 
6/3 for each n > n'. This results to 

fdfin - / fdfi < 6 
H Jh 

when n > n'. □ 
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Combining Lemma |4] and the formula ([H) we can now prove Theorem [T] 



Proof of TheoremU} First, let us consider another measurement model 

(51) ekn = AkUn+£, 

where the noise is not discretized and is now infinite-dimensional. The reconstructor for- 
mula can be used for this equation giving 

with 

1 2 -1 

(53) E{u,mk) = ex.p{--\\Aku\\^2 + {C^ ^^u, mfc)j|^-i) < exp(6 ||n||^) 
with some b > 0. Now Lemma |4] yields 

(54) lim ne,MUn)\mk) =nM{g{U)\m). 

fe,n— >oo 

The claim follows from |[34l Lemma 11. □ 

5.2. Weak convergence of the prior distribution. The Proposition 3.8.12. in Q yields 
the weak convergence of measures Un. 

Lemma 5. The probability distributions Vn converge weakly to v on L^(T). 

We want to show that with fixed a;2 € the distribution An"'^'^^'* converges weakly 
to A^^'^^). Since X^^^"^^ is not obtained with a straight-forward projection as in the case 
of Vn we recall conditions that ai^e needed in the weak convergence of general Gaussian 
distributions. The following lemma is proved in [7] as Example 3.8.15. 

Lemma 6. A sequence of Gaussian measures fin with means a„ and covariance operators 
Cn on a separable Hilbert space H converges weakly to a Gaussian measure /i with mean 
a and covariance operator C if and only if the following conditions are satisfied: 

(i) lim„^oo \\an - o-Wh = 0' 

(ii) lim„^oo l|Cn - C'||£(H) =0 '^"'^ 

(iii) lim„^ooTrj^(C„) = TTHiC). 

Let us prove an auxiliary lemma concerning the convergence of the multiplication oper- 
ators. 

Lemma 7. Let Vn ^ v in W^^'P'-'' (T) as n ^ oo. Then we have 

lim \\A{v)-An{Vn)\\aL^)=0. 

Proof. First notice that for some q > we have by the Sobolev embedding theorem that 
\\v — Vn\\(jo,a 0. For any continuous / : T — > M denote 

||/IL = sup|/(x)|. 
Let us then compute an upper bound 



1 1 



g2 + ^2 g2 + {QnVny 



<^{\{QnVnf-vl\ + \vl-V^\ 



< ^ (2 ll^'nlloo \QnVn - Vn\ + (H^nlL + ll^lloo) 11^" " ^ID • 
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Here the term \QnVn — Vn\ can be estimated pointwise as 

{Vn{x) - Vn{y))dy 
\Vnix) - Vn{y)\ I 



N / Vn{y)dy - Vnix) 




N 






< 




f - 




< 


1 





\x - y\ 



y^ dy 



n||(70,( 



where x G and is the half-open interval [{j — 1)/N,j/N). The above yields 



lim 

n— >oo 



e2 + ^2 g2 + (Q^^^)2 



and thus 

lim \\{A{v) - K{vn)) fWl, < lim 

n—foo n—foo 

for all / G L2(T). 

Lemma 8. Assume Vn G PL{n) and Vn - 
weakly to on L^(T). 



g2 + ^2 g2 + (Q„^;„)2 



2 

L2 



0. 



n 



V m VF*'''P*^(T). r/je measure \^ converges 



Proof. The condition (i) in Lemma [6] holds as the means stay constant. Furthermore, 
condition (ii) follows from the suitable convergence of the operators Sn and A„. Since 

L* = {D'q)^^\]^2 we see this from 



\\CuM-Cu{v)\\ciL^) 



< 



\SnD^^K{vn){D'^)-^S'^ - D^'A{v){DX'\ 



+ \\Sn 



)\\Dq'^-^n{Vn){D'q) 1 1 £(^2^//l) 



In the first two terms of the right hand side recall that A„(f„) is uniformly bounded in 
>C(L^(T)), i.e., the bound is independent of Vn- Since also D^^ is bounded from L^(T) to 
H^{T) we see that Lemma [3] provides the convergence of these terms. The convergence of 
the third term follows from Lemma|7J 

Let us next consider condition (iii). Recall now the projection T„ = DgSnD^^ : 
L^(T) L2(T). In the following we consider r„ from L'^{T) to i7*(T), s < 0, and hence 
the dual operators occur. Denote ej{x) = e^'^*-'^ for all j £ 1, and notice |(D^)^^ej| < 
where (j) = | j| + 1. We can then write 



(((r„ - /)A„K,)T;)(D;)-ie„ {DX'ej)L2 
+((A„K) - A{v))T:,{DX'e„ {DX'e,)L^ 
+(A(t;)(r^-/)p;)-ie„p;)-ie,)i2. 



Let us study the three terms separately: a dual norm estimation yields an upper bound for 
the first term 



(((r„ - I)A„,{vn)T;^){D')-'e„ {D')-^e,)L2 



< \\T 



< C ||T„, - 



IP: 



^\\c{L'^,H=)j 



-2-s 
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for any — l<s<— 1/2. In the second term we can use Lemma|7]to get 

{{Anivn) - h{v))T'n{D',)-'e,, {D'^Y' ej) 

< \\K{Vn) -Mv)\\c{L^) ||^n||£(L2) || (^g)"^ej 11^2 < o{n)j~'^, 

where o : N ^ [0, oo) denotes a function that satisfies lim„^oo o{n) = 0. The tiiird term 
yields similar upper estimate as the first term since 

{A{v){T:,- I)iD'^)-'e„{D'^)-'e,)L2 

^ C \\Tn - I\\c{L2,H'')j ^ ^• 

Due to Lemma |3] and the fact that Dg is invertible between H^(T) and i?*^^(T) for any 
t E M we have ||T„ — I\\^i^2 fjs-^ = o{n). Combining these three bounds yields 

((r„A„(t-)r^ - A(t;))(Z);)-ie„(Z);)-ie,)i2 < o{n)j-'-\ 

Since " ^i'^h — s > 1 is finite, we have shown that Tr]^2{Cu{v) — Cu„{v)) 

converges to zero. This concludes the proof. □ 

Let us recall the Skorohod coupling theorem. 

Theorem 4. Suppose that a sequence ofBorel probability measures fin on a complete sep- 
arable metric space B converges weakly to a Borel measure /U. Then there exists a proba- 
bility space (r2,IP) and measurable mappings X,Xn : 17 — > i3 such that /i„ = P o X~^, 
H = W o X"^ and X„ X a.s. 

At this point we fix 0,2 according to Theorem|4]in such a way that Vn ^ V in W^°''^°{T) 
almost surely. This choice is made to achieve the final result. Before following theorem 
recall the definition of uniform tightness: A sequence {fJ-n}^=i Borel measures on Banach 
space X is said to be uniformly tight if for every 6 > there exists a compact set Ks C X 
such that fin{X \ Ks) < S for every n G N. 

Theorem 5. When n goes to infinity the random variable {Un,Vn) converges in distribution 
to{U,V)in L2(T) X L2(T). 

Proof. Let us first show the uniform tightness of the sequence {A„}^i where A„ is the joint 
distribution of (C/„,K) on ^^(T) x L^{T). The convergence of Vn in distribution yields 
that probability measures {i'n}^=i are uniformly tight. Let (5 > be given and choose a 
compact set K2 C L^(T) in such a way that ^^(1^2) > 1 — |. Next we consider the tightness 
of a family {A^ | v € K2,n E N}. By Lemma[8]the sequence {A,°}^]^ converges weakly 
and in consequence is uniformly tight. We choose A'l C L^(T) so that X^iKi) > 1 — |. 
We may also assume that Ki is absolutely convex since by Proposition A. 1.6 in 171 closed 
absolutely convex hulls of compact sets are compact. Recall the definition of the covariance 
Cu{v) = LA{v)L* of AJ^ in equation ([TH). For any fixed v G L^(T) we know that 

/ {u,<l))l2dXl{n) = {Aiv)L*<P,L*<P)L2 < 1 = / (n, ,/.)i,<iA°(n). 

Jl2(T) e JL2(T) 

for aU G L'^{T). By Theorem 3.3.6 of d this yields 
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Now we are able to deduce the uniform tightness of {A„} by setting = x i^2> namely, 

\n{{L'{T)xL'{T))\Ks) 

= Xn{{L\T) \ Ki) X K2) + A„(l2(T) X (l2(T) \ K2)) 

<[ X"^{L\T)\Ki)dvn{v)+ I Xl{L\T))dvn{v) 
Jk2 Jl^{j)\K2 

< - + - = 5. 
- 2 2 

Moreover, by the Fubini theorem the characteristic function of {Un,Vn) can be written as 

Eexp {i{Un,(l))L^ +^(K,V')l2) 

= E I / exp(i(n,,/.)i2)dA^('^2)(n)exp(i(K(^2),V')L2) ) • 
\Jl2(j) J 

The almost sure convergence of Vn and Lemma [8] together imply 

lim / exp{i{u,(l))L2)dXl^^'^''\u)= [ exp{i{u,^)L2)dX^^'^^\u) 
and furthermore 

lim e-x.p{i{Vn{i02),tp)L2) = exp(i(y(u;2), V')l2) 

n—foo 

for almost every u;2 S 1^2- In consequence, we see by the Lebesgue dominated convergence 
theorem that the characteristic functions of {Un,Vn) converge to the characteristic function 
of {U, V) pointwise. 

By Corollary 3.8.5 in fl] the uniform tightness and pointwise converging characteristic 
functions yield that the random variables (C/„, Vn) converge in distribution. Since two mea- 
sures on B{L'^{T) X L^(T)) with equal characteristic functionals coincide we conclude that 
{U, V) is a limit. □ 

5.3. Uniformly finite exponential moments. In this section we establish the uniform ex- 
ponential boundedness of Vn), n G N and ([/, V). Here we denote 

\\{f,9)\\L^.L^ ■■=\/\\f\\h + \\9\\l2 

for all /,geL2(T). 

Lemma 9. For every 6 > there exists a constant C(b) > such that 

(55) Eexp(6||(?7„,K)||L2xL2) <C(6) and Eexp(6 ||([/, y)||^2xL2) < C(6) 
for every n G N. 

Proof. Let us first show the boundedness of the exponential moments of ([/, V). By using 
the inequality ||(/,5)||l2xl2 < II/IIl2 + 11911^2 and Lemma |2] we have 

Eexp(6||(C/,F)||^.,^.) < Eexp(6||rv^(^,)VF(u;i)||^, + 6||y(..2)|li2) 

(56) < EeMHW{uJi)\\Hs)-^^Mb\\V{uJ2)\\L2) 

with some constant h > Q and some — 1 < s < — i. Moreover, the Fernique theorem |[T3l 
Thm. 2.6] states that for every Gaussian random variable X in Banach space {B,B{B)) 
there exists a constant a > such that 

Eexp(a||X-EX|||) < 00. 

Let 6 € M be arbitrary. The trivial estimate < a{x — 5/2a)^ for any x G M yields 

(57) Eexp(6||X||^) < exp(6 ||EX||^) • exp(6V4a) • Eexp(a ||X - EX|||) < 00. 
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Now the claim for {U, V) follows by applying inequality (|57] ) to the right-hand side of 
inequality (l56l ). 

The uniform bound for n G N, requires more careful analysis. Consider for the 

moment a Gaussian random variable X in L^(T) with covariance operator Cx '■ L'^{T) — > 
L^(T) such that dimRan(Cx) = ^ < oo. Let {pjYj^^ be the non-zero eigenvalues and 
{4>j}j=i be the corresponding L^-normalized eigenvectors of Cx- Notice that the normal 
random variables {X — ¥,X, 4>j) 12 and {X — EX, (/>A;)i2 are independent when j / k. For 
any a < l/{2pj), 1 < j < ^, we have 

E(exp(a(X - EX,(Pj)l2)) = (1 - 2apj)-^. 

The operator Cx is positive definite and hence 

max Pi < TrT2(Cx)- 
ie{i,...,n 

Notice now that (1 - s)-i/2 < 1 + s with < s < 1/2. In consequence, if a satisfies 

1 

4TvL2{Cx) 

then for every j = 1, £ it follows that 

(58) E(exp(a(X - EX, (pjf)) < 1 + 2apj < exp{2apj). 

Due to the independence of random variables {X — EX, (j)j)L2 and HSll we have 

t 

(59) Eexp(a||X -EXll^a) = JJ Eexp(a(X - EX, (/.j)!^) 

i=i 

i 

< exp(2a^pj) < exp(2aTri2(Cx)) < 00 
i=i 

where we have used the inequality (l58l) . Combining inequalities (l57l) and ( [59l l in the case 
B = L2(T) yields 

(60) Eexp(6 ||X||^2) < exp(6 ||EX||^2) • exp{b'^/4a) ■ exp(2aTri2(Cx)). 

Let us next show that the trace of Cu„ {Vn{u!2)) is bounded uniformly with respect to n G N 
and 0J2 G ^^2- Denote ej{x) = exp(— 27rijx) for a; G T and j G Z. A straightforward 
computation yields 

TlL2{CuAVn{iV2))) = Y.^K{Vn{i^2))TnD^^e„TnD^'e,)L2 



(61) < ^ J^||l)-ie,-||^, =C"<oo 

for some constant C < 00. Clearly C" does not depend on nor lu2. With similar arguments 
we can show that 

(62) TrL2{CvJ<C" 

where constant C" does not depend on n. By the Fubini theorem we have 

Eexp(6||(C/„,K)|lL2xL0< 



exp(6 ||u||^2)(iA^(M) • exp(6 11^11^2) dun{v) 

L2(T) \ Jl2(t) / 
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and finally due to inequalities (l60l) . (|6T] ) and (l62l ) we obtain 

Eexp(6||(C/„,K)||^.^^2) < exp(5V4a)exp(2aC")IEexp(&||K(w2)IL2) 

< exp{b^ /2a + b + 2a{C' + C")) 

for any a < | min(^, The claim follows by taking the maximum of the bounds on 

{U,V)and{Un,Vn),n£n. □ 

6. Computational example 

In this section we illustrate by a numerical example how the method produces reconstruc- 
tions with similar properties as Ambrosio-Tortorelli minimization |2, 3] in deterministic 
case. We show how the choice of e controls the edge-preserving property of our reconstruc- 
tion method. Moreover, we compute reconstructions with different choices of n to convince 
the reader that the estimates stay stable. 

6.1. The model problem. Let us consider a Bayesian deblurring problem M = AU + £ 
on T where A : L^{T) C°°(T) is the operator 

(63) Au{x) = / K{x,y)u{y)dy 

Jt 

with a priori known smooth kernel K satisfying fj. K{x\ y)dx' = K{x, y')dy' = 1 for 
all y G T. Assume also the following two properties: 

(i) the noise £ can be modeled by white noise statistics and 

(ii) the measurement projection P]^ : L^(T) PL{k) is proper in the sense of Defini- 
tion H 

As we have earlier discussed the assumptions above are related to the measurement situa- 
tion. Let us then implement the prior distributions and discretization introduced in previous 
sections. Recall mappings X^, Jn ■ PL{n) — > with N = 2^ from Section 3.3. Using 
Theorems [2] and [3] we see that the posterior density for computational model (O has the 
following form: let u = X„(u) and v = Jn{v)- Then we have 

(64) vrfcn(u, V I m) oc exp(-^F,,fe,n(ii, I rn)) 
where u, v S M^, m G R-^ and 

F,,kAu,v\m) = j^[-N \og{^ + {Qnvf) + {e' + {Qnvf)\Dqu\'' 

+e \Dv\^ + — (1 — v)"^ + \Akn'^ — dx 

where u,v & PL{n), m £ PL{k), N = 2'^ K = 2^. Due to equation ([12]) the 
computational task is then to evaluate integrals 

^kn^ ~ I u • 7rfc„(u, V I m) dudv and 

(65) vf^ = / V • 7rfc„(u, V I m) dudv. 

JR^xR^ 



6.2. Computation of the CM estimates. The integrals in equation (1651 ) are taken over a 
very large dimensional space and for that reason it is impossible to implement efficiently 
any quadrature rule. Usually in such situations different types of Markov Chain Monte 
Carlo (MCMC) methods are used to obtain a solution. In the following let us ease our 
presentation by denoting 



w 



i2N 
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The idea of MCMC algorithms is to generate a collection w^, E of samples ac- 
cording to the posterior distribution. When L is large we can approximate the CM estimates 
in (l65]l by 

where Iq stands for the number of samples in a bum-in period, i.e., the samples that do not 
explore the posterior distribution representatively and are discarded. 

The algorithm used here for generating the ensemble is an adaptive version of the Met- 
ropolis-Hastings (MH) algorithm |[24ll38ll2ni22l l44l. namely single component adaptive 
Metropolis (SCAM) algorithm introduced in ||23]| . The SCAM algorithm is similar to the 
basic single component Metropolis algorithm in the sense that a sample state, say, is 
attained by updating the coordinates separately. When deciding the coordinate a 

sample is drawn from the normal distribution AA(w^^^, crj) centered at the previous point 
with variance cjj. The difference is to update variances crj according to the rule 



Here s denotes the scaling factor for which the value s = 2.4 (see ll23l[T8l ) is used here. The 
role of 5 is to prevent the variance from shrinking to zero and a small constant {5 = 10"^) 
is used as its value. We close this section by showing in pseudo-code how the SCAM 
algorithm can be implemented. 

(1) Initialize w° e M^^ and variances (cj°)^^i. Set i := I and j := 1. 

(2) Update cr^ from formula (l67l) . 

(3) Sample tj G M from Af{0, a^) and set 

w"^- = (wf,...,wj_i,w5-i+r„w5;l,...,w^^i)^ and 



(4) If 



w°'^ = (wf,...,wj_„w^^-\w5;},...,wy)^. 

7rfc„(w"^"' I m) > 7rfe„(w°''^ | m), 
set := w^^^ + Tj; and go to 6. 



(5) Draw a random number t from the uniform distribution on [0,1]. If 



7rfc„(w"<="' I m) 



set Wj := ^ + Tj-, else set := ^. 
(6) If j < 2N, set j ^ j + 1 and go to 2; else if j = 2iV and £ < L, set £ ^ £ + 1 and 
j <— 1 and go to 2; else if j = 2N and £ = L then stop. 

6.3. Results. All computations were done using the interval [0, 1] with point 1 identified 
as 0. Here the parameter for measurement nodes is kept fixed and is chosen to be = 7, 
i.e., we have K = 2^ = 128 measurement nodes. The number of nodes for the estimates 
varies between 64 and 256, i.e., n varies between 6 and 8. See Figure 1 for the exact 
solution u G L^(T) and the measured data m^. G PL{k). The noise in the measurement 
was produced from a white noise distribution. Parameters of the MCMC computations are 
given in Table 1 ; in each case we take initial values that correspond zero function for u and 
= 1 function for v. Both Figures 2 and 3 illustrate how the results look when n is 
increased. The difference between the two figures is the choice of e; in Figure 2 we have 
chosen e = 10~^ and in Figure 3 the corresponding value is 3 x 10^^. Moreover, parameter 
q in (fT3] ) was chosen large enough in order to get quantity e'^ neglectable. 
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Table 1 . Parameters of MCMC computations. The number is the di- 
mension of reconstruction, e is the prior parameter, L — is the number 
of samples used for computing the CM estimate, r is the total acceptance 
ratio, i.e., all samples accepted vs. samples tested and the last column indi- 
cates the amount of CPU time used for computations. 



N 


e 


L-io 


r 


Time (h) 


64 


10"^ 


10'' 


0.35 


6.6 


64 


3 X 10-^ 


10^ 


0.36 


7.3 


128 


10-3 


2 X 10*^ 


0.27 


25.3 


128 


3 X IQ-"^ 


2 X 10^ 


0.33 


26.9 


256 


10-3 


2 X 10^ 


0.18 


50.6 


256 


3 X ur^ 


2 X 10*' 


0.25 


53.7 



We perform all the computations with Matlab 7.6 running in a desktop PC computer with 
an AMD Opteron 265 dual-dual processor and 8 GB of RAM. Note that the algorithm is 
not parallelized and thus only one of the processors running at l,8GHz was in full use at a 
time. 




1/3 2/3 1 1/3 2/3 

Figure 1. Left: exact solution u, Right: measurement ruk = Mk{LOo)- 



N = 64 



1/3 



2/3 



N = 128 




1/3 



2/3 




1/3 



2/3 




1/3 



2/3 




1/3 



2/3 




1/3 



2/3 



Figure 2. All the plots in this figure are obtained with the choice e = 

10-3 and k = 7. Top row: the CM estimates u'^J^ with n = 6, 7, 8 (thick 



CM 



line) and the true signal (thin line) Bottom row: the CM estimates Vj^J^ 



6.4. Discussion. We have computed the CM estimates in relatively low dimensions (high- 
est being N = 256). This is due to the long computational times of MCMC algorithms. 
The computational times can be improved with more sophisticated algorithm design, e.g.. 
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iV = 64 



1/3 



1/3 2/3 1 


1 


1 1 





2/3 




1/3 



2/3 



N = 256 




1/3 



2/3 



1/3 



2/3 



Figure 3. All the plots in this figure are obtained with the choice e = 
3 • lO"'^ and k = 7. Top row: the CM estimates u'j:^ with n = 6, 7, 8 
(thick line) and the true signal (thin line) Bottom row: the CM estimates 

CM 



parallelization. Furthermore, we expect MCMC methods to become much feasible in the 
future due to evolution of computers. 

It is evident from Figures |2] and [3] that the sharpness of edges in the CM estimates can 
be controlled via e and the CM estimates u'^^ seem stable with respect to n. The re- 
sults concerning u^^^ fit well to our expectations of the true CM estimate being a slightly 
smoothened approximation of the real signal represented in Figure [T] Considering the rela- 
tively large noise in the measurement we conclude that the method estimates the true signal 
u well. However, one can notice changes in functions v^^^ . First of all, given larger value 
of the functions v^J^^ become smoother. This phenomena is less visible with smaller 
values of e but note that we have not proved what the limiting estimates are exactly. The 
author expects this phenomena to stabilize with higher values of N but it should be checked 
in the future studies. Second, given smaller value of e the maximum of — 1| becomes 
smaller. Although the asymptotic analysis of taking e to zero was not considered in this 
paper we expect that some coupling of N and e need to be made for algorithm to work 
properly asymptotically with respect to e. In the deterministic minimization problems of 
discrete Ambrosio-Tortorelli functional one typically needs to assume that A^(e)e^ oo 
when e goes to zero (see e.g. ["4]) 

We conclude this discussion by pointing out that we have not used any ad-hoc weighting 
of the prior or likelihood information. This additional flexibility of the algorithm can be 
achieved by scaling the covariances of [/ or F with a constant. 
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